Search CORE

216 research outputs found

Factoring out ordered sections to expose thread-level parallelism

Author: De Bosschere Koen
Rul Sean
Vandierendonck Hans
Publication venue
Publication date: 01/01/2009
Field of study

With the rise of multi-core processors, researchers are taking a new look at extending the applicability auto-parallelization techniques. In this paper, we identify a dependence pattern on which autoparallelization currently fails. This dependence pattern occurs for ordered sections, i.e. code fragments in a loop that must be executed atomically and in original program order. We discuss why these ordered sections prohibit current auto-parallelizers from working and we present a technique to deal with them. We experimentally demonstrate the efficacy of the technique, yielding significant overall program speedups

Queen's University Belfast Research Portal

Ghent University Academic Bibliography

Classifying Data Dependencies Between Functions

Author: De Bosschere Koen
Rul Sean
Vandierendonck Hans
Publication venue: Academia Press
Publication date: 01/01/2006
Field of study

Ghent University Academic Bibliography

Can we apply accelerator-cores to control-intensive programs?

Author: De Bosschere Koen
Rul Sean
Vandierendonck Hans
Publication venue
Publication date: 01/01/2009
Field of study

There is a trend towards using accelerators to increase performance and energy efficiency of general-purpose processors. So far, most accelerators have been build with HPC-applications in mind. A question that arises is how well can other applications benefit from these accelerators? In this paper, we discuss the acceleration of three benchmarks using the SPUs of a Cell-BE. We analyze the potential speedup given the inherent parallelism in the applications. While the potential speedup is significant in all benchmarks, the obtained speedup lags behind due to a mismatch between micro-architectural properties of the accelerators and the benchmark properties

Ghent University Academic Bibliography

D4.2 Programming Language and Runtime System: Early Prototype (executive Summary)

Author: Hans Vandierendonck
Publication venue
Publication date
Field of study

This document presents the executive summary of the deliverable on Programming Language and Runtime System: Early Prototype, which aims at describing the core functionality of the VINEYARD programming model and runtime system for accelerated data centres. We describe our approach to creating an abstract representation of accelerated kernels, such that application programmers can use these kernels without needing to worry about accelerator-specific calling conventions, or about the specific versions available in the VINEYARD accelerator library. The second key contribution of this document is the description of our approach to virtualizing accelerators. We assume that accelerators are assigned to jobs only when they are really needed, and not at job allocation time. This raises issues that need to be addressed in the virtualization layer and also in the application’s runtime. We describe these issues and our approach to solving

ZENODO

Reducing the burden of parallel loop schedulers for many-core processors

Author: Arif Mahwish
Vandierendonck Hans
Publication venue: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Publication date: 10/02/2018
Field of study

Queen's University Belfast Research Portal

Crossref

Apollo (Cambridge)

An experimental study on performance portability of OpenCL kernels

Author: D'Haene Joris
De Bosschere Koen
Rul Sean
Vandierendonck Hans
Publication venue
Publication date: 01/01/2010
Field of study

Accelerator processors allow energy-efficient computation at high performance, especially for computationintensive applications. There exists a plethora of different accelerator architectures, such as GPUs and the Cell Broadband Engine. Each accelerator has its own programming language, but the recently introduced OpenCL language unifies accelerator programming languages. Hereby, OpenCL achieves functional protability, allowing to reduce the development time of kernels. Functional portability however has limited value without performance portability: the possibility to re-use optimized kernels with good performance. This paper investigates the specificity of code optimizations to accelerator architecture and the severity of lack of performance portability

Ghent University Academic Bibliography

Parallel Programming of General-Purpose Programs Using Task-Based Programming Models

Author: Nikolopoulos Dimitrios
Pratikakis Polyvios
Vandierendonck Hans
Publication venue: USENIX
Publication date: 01/05/2011
Field of study

Queen's University Belfast Research Portal

Parallel Programming of General-Purpose Programs Using Task-Based Programming Models

Author: Nikolopoulos Dimitrios
Pratikakis Polyvios
Vandierendonck Hans
Publication venue: USENIX
Publication date: 01/01/2011
Field of study

The prevalence of multicore processors is bound to drive most kinds of software development towards parallel programming. To limit the difficulty and overhead of parallel software design and maintenance, it is crucial that parallel programming models allow an easy-to-understand, concise and dense representation of parallelism. Parallel programming models such as Cilk++ and Intel TBBs attempt to offer a better, higher-level abstraction for parallel programming than threads and locking synchronization. It is not straightforward, however, to express all patterns of parallelism in these models. Pipelines are an important parallel construct, although difficult to express in Cilk and TBBs in a straightfor- ward way, not without a verbose restructuring of the code. In this paper we demonstrate that pipeline parallelism can be easily and concisely expressed in a Cilk-like language, which we extend with input, output and input/output dependency types on procedure arguments, enforced at runtime by the scheduler. We evaluate our implementation on real applications and show that our Cilk-like scheduler, extended to track and enforce these dependencies has performance comparable to Cilk++

Queen's University Belfast Research Portal

Ghent University Academic Bibliography

Analysis of Dependence Tracking Algorithms for Task Dataflow Execution

Author: Nikolopoulos Dimitrios
Tzenakis Georgios
Vandierendonck Hans
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/12/2013
Field of study

Queen's University Belfast Research Portal

Accelerating Graph Analytics by Utilising the Memory Locality of Graph Partitioning

Author: Nikolopoulos Dimitrios S.
Sun Jiawen
Vandierendonck Hans
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/09/2017
Field of study

Queen's University Belfast Research Portal

Crossref